This study presents a comprehensive transcriptomic analysis of lung tissue from individuals with chronic obstructive pulmonary disease (COPD) compared to controls using the GEO dataset GSE76925. After quality control filtering based on probe detection p-values and log2 transformation of intensities, unsupervised principal component analysis revealed structured transcriptional variation across samples. While the dominant variance axis (PC1) was not directly associated with disease status, a secondary axis (PC2) demonstrated significant separation between COPD and control groups, suggesting the presence of a robust but heterogeneous disease-associated transcriptional signal.

Supervised differential expression modelling using limma with empirical Bayes moderation identified widespread gene dysregulation in COPD lung tissue. Pathway-level analysis through Gene Ontology, Reactome over-representation analysis, and rank-based Gene Set Enrichment Analysis (GSEA) consistently highlighted coordinated upregulation of extracellular matrix remodeling and SLIT–ROBO signaling pathways, alongside downregulation of translational and ribosomal biogenesis programs. Together, these findings indicate that COPD is characterised not by isolated gene-level perturbations but by systematic reprogramming of structural and biosynthetic pathways, consistent with chronic tissue injury and airway remodeling.

1 Step A — Load packages and import expression matrix

This notebook analyses the microarray dataset GSE76925 (COPD vs control lung tissue).
The raw file includes two types of columns:

1.1 Step A.1 — Install (if needed) and load required packages

# CRAN + Bioconductor package helper
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

load_or_install <- function(pkgs, bioc = FALSE) {
  for (p in pkgs) {
    if (!requireNamespace(p, quietly = TRUE)) {
      if (bioc) BiocManager::install(p, ask = FALSE, update = FALSE) else install.packages(p)
    }
    suppressPackageStartupMessages(library(p, character.only = TRUE))
  }
}

if (!requireNamespace("pheatmap", quietly = TRUE)) install.packages("pheatmap")

# CRAN packages
load_or_install(c("tidyverse", "data.table", "pheatmap"))

# Bioconductor packages
load_or_install(c("GEOquery", "limma", "clusterProfiler", "org.Hs.eg.db", "enrichplot", "ReactomePA"),
                bioc = TRUE)

1.2 Step A.2 — Read the expression file

# Read gz file directly
expr_raw <- read.delim(
  "GSE76925_non-normalized (2).txt.gz",
  header = TRUE,
  sep = "\t",
  check.names = FALSE
)

dim(expr_raw)
## [1] 47249   302
head(expr_raw[, 1:6])

Interpretation: columns such as sample1 are intensities; columns such as sample1.Detection.Pval are QC flags for that probe in that sample.

1.3 Step A.3 — Split intensities vs detection p-values

What we do: separate the raw table into two matrices.

Why it matters: detection p-values are not expression. If you leave them in the expression matrix, PCA, differential expression, and clustering become meaningless.

# Expression columns are those NOT ending in ".Detection.Pval"
expr_mat <- expr_raw[, !grepl("Detection\\.Pval$", colnames(expr_raw)), drop = FALSE]

# Detection p-value columns DO end in ".Detection.Pval"
detp_mat <- expr_raw[, grepl("Detection\\.Pval$", colnames(expr_raw)), drop = FALSE]

# Sanity checks
dim(expr_mat)
## [1] 47249   151
dim(detp_mat)
## [1] 47249   151
head(colnames(expr_mat), 6)
## [1] "sample1" "sample2" "sample3" "sample4" "sample5" "sample6"
head(colnames(detp_mat), 6)
## [1] "sample1.Detection.Pval" "sample2.Detection.Pval" "sample3.Detection.Pval"
## [4] "sample4.Detection.Pval" "sample5.Detection.Pval" "sample6.Detection.Pval"

2 Step B — QC filtering and transformation

2.1 Step B.1 — Filter probes that are rarely detected

What we do: keep probes detected (detection p-value < 0.01) in at least 20% of samples.

Why it matters: microarrays include probes that are effectively noise in a given tissue. Keeping them inflates the multiple testing burden and adds noise to PCA and modelling.

detected <- detp_mat < 0.01
keep <- rowMeans(detected, na.rm = TRUE) >= 0.20

expr_f <- expr_mat[keep, , drop = FALSE]
dim(expr_f)
## [1] 21618   151

2.2 Step B.2 — Log2 transform intensities

What we do: log2(x + 1).

Why it matters: intensities are right-skewed and variance increases with the mean. Log transformation stabilises variance and improves linear modelling assumptions.

expr_log <- log2(expr_f + 1)
summary(as.vector(expr_log))
##           Length Class  Mode   
## sample1   21618  -none- numeric
## sample2   21618  -none- numeric
## sample3   21618  -none- numeric
## sample4   21618  -none- numeric
## sample5   21618  -none- numeric
## sample6   21618  -none- numeric
## sample7   21618  -none- numeric
## sample8   21618  -none- numeric
## sample9   21618  -none- numeric
## sample10  21618  -none- numeric
## sample11  21618  -none- numeric
## sample12  21618  -none- numeric
## sample13  21618  -none- numeric
## sample14  21618  -none- numeric
## sample15  21618  -none- numeric
## sample16  21618  -none- numeric
## sample17  21618  -none- numeric
## sample18  21618  -none- numeric
## sample19  21618  -none- numeric
## sample20  21618  -none- numeric
## sample21  21618  -none- numeric
## sample22  21618  -none- numeric
## sample23  21618  -none- numeric
## sample24  21618  -none- numeric
## sample25  21618  -none- numeric
## sample26  21618  -none- numeric
## sample27  21618  -none- numeric
## sample28  21618  -none- numeric
## sample29  21618  -none- numeric
## sample30  21618  -none- numeric
## sample31  21618  -none- numeric
## sample32  21618  -none- numeric
## sample33  21618  -none- numeric
## sample34  21618  -none- numeric
## sample35  21618  -none- numeric
## sample36  21618  -none- numeric
## sample37  21618  -none- numeric
## sample38  21618  -none- numeric
## sample39  21618  -none- numeric
## sample40  21618  -none- numeric
## sample41  21618  -none- numeric
## sample42  21618  -none- numeric
## sample43  21618  -none- numeric
## sample44  21618  -none- numeric
## sample45  21618  -none- numeric
## sample46  21618  -none- numeric
## sample47  21618  -none- numeric
## sample48  21618  -none- numeric
## sample49  21618  -none- numeric
## sample50  21618  -none- numeric
## sample51  21618  -none- numeric
## sample52  21618  -none- numeric
## sample53  21618  -none- numeric
## sample54  21618  -none- numeric
## sample55  21618  -none- numeric
## sample56  21618  -none- numeric
## sample57  21618  -none- numeric
## sample58  21618  -none- numeric
## sample59  21618  -none- numeric
## sample60  21618  -none- numeric
## sample61  21618  -none- numeric
## sample62  21618  -none- numeric
## sample63  21618  -none- numeric
## sample64  21618  -none- numeric
## sample65  21618  -none- numeric
## sample66  21618  -none- numeric
## sample67  21618  -none- numeric
## sample68  21618  -none- numeric
## sample69  21618  -none- numeric
## sample70  21618  -none- numeric
## sample71  21618  -none- numeric
## sample72  21618  -none- numeric
## sample73  21618  -none- numeric
## sample74  21618  -none- numeric
## sample75  21618  -none- numeric
## sample76  21618  -none- numeric
## sample77  21618  -none- numeric
## sample78  21618  -none- numeric
## sample79  21618  -none- numeric
## sample80  21618  -none- numeric
## sample81  21618  -none- numeric
## sample82  21618  -none- numeric
## sample83  21618  -none- numeric
## sample84  21618  -none- numeric
## sample85  21618  -none- numeric
## sample86  21618  -none- numeric
## sample87  21618  -none- numeric
## sample88  21618  -none- numeric
## sample89  21618  -none- numeric
## sample90  21618  -none- numeric
## sample91  21618  -none- numeric
## sample92  21618  -none- numeric
## sample93  21618  -none- numeric
## sample94  21618  -none- numeric
## sample95  21618  -none- numeric
## sample96  21618  -none- numeric
## sample97  21618  -none- numeric
## sample98  21618  -none- numeric
## sample99  21618  -none- numeric
## sample100 21618  -none- numeric
## sample101 21618  -none- numeric
## sample102 21618  -none- numeric
## sample103 21618  -none- numeric
## sample104 21618  -none- numeric
## sample105 21618  -none- numeric
## sample106 21618  -none- numeric
## sample107 21618  -none- numeric
## sample108 21618  -none- numeric
## sample109 21618  -none- numeric
## sample110 21618  -none- numeric
## sample111 21618  -none- numeric
## sample112 21618  -none- numeric
## sample113 21618  -none- numeric
## sample114 21618  -none- numeric
## sample115 21618  -none- numeric
## sample116 21618  -none- numeric
## sample117 21618  -none- numeric
## sample118 21618  -none- numeric
## sample119 21618  -none- numeric
## sample120 21618  -none- numeric
## sample121 21618  -none- numeric
## sample122 21618  -none- numeric
## sample123 21618  -none- numeric
## sample124 21618  -none- numeric
## sample125 21618  -none- numeric
## sample126 21618  -none- numeric
## sample127 21618  -none- numeric
## sample128 21618  -none- numeric
## sample129 21618  -none- numeric
## sample130 21618  -none- numeric
## sample131 21618  -none- numeric
## sample132 21618  -none- numeric
## sample133 21618  -none- numeric
## sample134 21618  -none- numeric
## sample135 21618  -none- numeric
## sample136 21618  -none- numeric
## sample137 21618  -none- numeric
## sample138 21618  -none- numeric
## sample139 21618  -none- numeric
## sample140 21618  -none- numeric
## sample141 21618  -none- numeric
## sample142 21618  -none- numeric
## sample143 21618  -none- numeric
## sample144 21618  -none- numeric
## sample145 21618  -none- numeric
## sample146 21618  -none- numeric
## sample147 21618  -none- numeric
## sample148 21618  -none- numeric
## sample149 21618  -none- numeric
## sample150 21618  -none- numeric
## sample151 21618  -none- numeric

3 Step C — Download GEO sample metadata and define COPD status

Differential expression needs phenotype labels (COPD vs control). We pull these from GEO.

3.1 Step C.1 — Download GEO metadata and extract the phenotype table

gse <- getGEO("GSE76925", GSEMatrix = TRUE)
eset <- gse[[1]]
pheno <- pData(eset)

dim(pheno)
## [1] 151  58
colnames(pheno)[1:20]
##  [1] "title"                  "geo_accession"          "status"                
##  [4] "submission_date"        "last_update_date"       "type"                  
##  [7] "channel_count"          "source_name_ch1"        "organism_ch1"          
## [10] "characteristics_ch1"    "characteristics_ch1.1"  "characteristics_ch1.2" 
## [13] "characteristics_ch1.3"  "characteristics_ch1.4"  "characteristics_ch1.5" 
## [16] "characteristics_ch1.6"  "characteristics_ch1.7"  "characteristics_ch1.8" 
## [19] "characteristics_ch1.9"  "characteristics_ch1.10"
head(pheno[, 1:10])

3.2 Step C.2 — Confirm sample counts and align sample IDs

Goal: make sure expression columns and phenotype rows refer to the same samples.

ncol(expr_log)
## [1] 151
nrow(pheno)
## [1] 151
# Replace generic column names with GSM IDs from GEO
colnames(expr_log) <- pheno$geo_accession
head(colnames(expr_log))
## [1] "GSM2040792" "GSM2040793" "GSM2040794" "GSM2040795" "GSM2040796"
## [6] "GSM2040797"

3.3 Step C.3 — Create the COPD vs control grouping variable

Approach used here: parse the title field.

If your dataset uses a different text pattern, change the grepl() rule (this is the only fragile part).

group <- ifelse(grepl("case", pheno$title, ignore.case = TRUE), "COPD", "Control")
table(group)
## group
## Control    COPD 
##      40     111
group <- factor(group, levels = c("Control", "COPD"))

4 Step D — Exploratory PCA (unsupervised)

PCA is a sanity check: it tells you whether samples cluster by phenotype, whether outliers exist, and whether the main variance is disease-related or driven by other effects.

4.1 Step D.1 — PCA with scaling across genes

Scaling makes each probe contribute equally. This can help sometimes, but it can also over-weight noisy probes.

pca_scaled <- prcomp(t(expr_log), center = TRUE, scale. = TRUE)

pca_df <- data.frame(
  PC1 = pca_scaled$x[, 1],
  PC2 = pca_scaled$x[, 2],
  group = group
)

ggplot(pca_df, aes(PC1, PC2, colour = group)) +
  geom_point(size = 3, alpha = 0.85) +
  stat_ellipse(aes(fill = group), geom = "polygon", alpha = 0.10, linetype = 2) +
  theme_minimal(base_size = 13) +
  labs(title = "PCA (scaled) - GSE76925", x = "PC1", y = "PC2", colour = "Group") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"))

cat(
  "Interpretation (PCA — scaled):\n",
  "- Each point is one sample; distance between points reflects overall transcriptomic similarity.\n",
  "- PC1 and PC2 are the top two axes of variation in the dataset.\n",
  "- If COPD and Control form visibly different clusters, that suggests a strong disease-driven expression signal.\n",
  "- If they overlap heavily, the disease signal is weaker than other factors (e.g., batch effects, smoking status, cell-type composition).\n",
  "- Outliers far from the main cloud may indicate technical artefacts or biologically distinct samples.\n",
  sep = ""
)
## Interpretation (PCA — scaled):
## - Each point is one sample; distance between points reflects overall transcriptomic similarity.
## - PC1 and PC2 are the top two axes of variation in the dataset.
## - If COPD and Control form visibly different clusters, that suggests a strong disease-driven expression signal.
## - If they overlap heavily, the disease signal is weaker than other factors (e.g., batch effects, smoking status, cell-type composition).
## - Outliers far from the main cloud may indicate technical artefacts or biologically distinct samples.

4.2 Step D.2 — PCA without scaling across genes

No scaling preserves the natural variance structure across genes, which is often reasonable for log-transformed microarray data.

pca_noscale <- prcomp(t(expr_log), center = TRUE, scale. = FALSE)

pca_df2 <- data.frame(
  PC1 = pca_noscale$x[, 1],
  PC2 = pca_noscale$x[, 2],
  group = group
)

ggplot(pca_df2, aes(PC1, PC2, colour = group)) +
  geom_point(size = 3, alpha = 0.85) +
  stat_ellipse(aes(fill = group), geom = "polygon", alpha = 0.12, linetype = 2) +
  theme_minimal(base_size = 13) +
  labs(title = "PCA (no scaling) - GSE76925", x = "PC1", y = "PC2", colour = "Group") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"))

cat(
  "Interpretation (PCA — no scaling):\n",
  "- This PCA keeps genes at their natural variance levels (high-variance genes influence PCs more).\n",
  "- Often more realistic for microarrays because it preserves dominant biological variation.\n",
  "- If separation improves here compared with the scaled PCA, it suggests a smaller subset of high-variance genes drives group structure.\n",
  "- If separation worsens, scaling may have helped expose subtle disease signal.\n",
  sep = ""
)
## Interpretation (PCA — no scaling):
## - This PCA keeps genes at their natural variance levels (high-variance genes influence PCs more).
## - Often more realistic for microarrays because it preserves dominant biological variation.
## - If separation improves here compared with the scaled PCA, it suggests a smaller subset of high-variance genes drives group structure.
## - If separation worsens, scaling may have helped expose subtle disease signal.

4.3 Step D.3 — Test whether PCs differ by COPD status

These tests help confirm whether disease status aligns with a major variance axis.

pc1_scores <- pca_noscale$x[, 1]
pc2_scores <- pca_noscale$x[, 2]

t.test(pc1_scores ~ group)
## 
##  Welch Two Sample t-test
## 
## data:  pc1_scores by group
## t = 0.1672, df = 63.565, p-value = 0.8677
## alternative hypothesis: true difference in means between group Control and group COPD is not equal to 0
## 95 percent confidence interval:
##  -31.22036  36.92301
## sample estimates:
## mean in group Control    mean in group COPD 
##             2.0960079            -0.7553182
t.test(pc2_scores ~ group)
## 
##  Welch Two Sample t-test
## 
## data:  pc2_scores by group
## t = -6.519, df = 103.9, p-value = 2.602e-09
## alternative hypothesis: true difference in means between group Control and group COPD is not equal to 0
## 95 percent confidence interval:
##  -46.74392 -24.93842
## sample estimates:
## mean in group Control    mean in group COPD 
##             -26.34682               9.49435
df_pc1 <- data.frame(PC1 = pc1_scores, group = group)
df_pc2 <- data.frame(PC2 = pc2_scores, group = group)

ggplot(df_pc1, aes(group, PC1, fill = group)) +
  geom_boxplot(alpha = 0.6) +
  theme_minimal(base_size = 13) +
  labs(title = "PC1 by group (no scaling)", x = NULL, y = "PC1") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        legend.position = "none")

cat(
  "Interpretation (PC1 by group):\n",
  "- This tests whether the main axis of variation (PC1) differs between COPD and Control.\n",
  "- A non-significant p-value means PC1 is likely driven by non-disease factors (e.g., technical batch, tissue composition, smoking).\n",
  "- A significant p-value suggests COPD status contributes strongly to global variation in expression.\n",
  sep = ""
)
## Interpretation (PC1 by group):
## - This tests whether the main axis of variation (PC1) differs between COPD and Control.
## - A non-significant p-value means PC1 is likely driven by non-disease factors (e.g., technical batch, tissue composition, smoking).
## - A significant p-value suggests COPD status contributes strongly to global variation in expression.
ggplot(df_pc2, aes(group, PC2, fill = group)) +
  geom_boxplot(alpha = 0.6) +
  theme_minimal(base_size = 13) +
  labs(title = "PC2 by group (no scaling)", x = NULL, y = "PC2") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        legend.position = "none")

cat(
  "\n\nInterpretation (PC2 by group):\n",
  "- This tests whether the second major axis (PC2) is associated with COPD status.\n",
  "- A significant p-value here implies a robust disease-related transcriptional axis, even if PC1 is dominated by other factors.\n",
  "- This is common in human lung datasets where cell mixture/batch drives PC1 and disease signal appears on PC2 or PC3.\n",
  sep = ""
)
## 
## 
## Interpretation (PC2 by group):
## - This tests whether the second major axis (PC2) is associated with COPD status.
## - A significant p-value here implies a robust disease-related transcriptional axis, even if PC1 is dominated by other factors.
## - This is common in human lung datasets where cell mixture/batch drives PC1 and disease signal appears on PC2 or PC3.

5 Step E — Differential expression with limma

We model each probe as a function of COPD status. limma fits linear models efficiently and uses empirical Bayes shrinkage for stable inference.

5.1 Step E.1 — Fit limma model

design <- model.matrix(~ group)
colnames(design)
## [1] "(Intercept)" "groupCOPD"
fit <- lmFit(expr_log, design)
fit <- eBayes(fit)

results <- topTable(
  fit,
  coef = "groupCOPD",
  number = Inf,
  adjust.method = "BH"
)

head(results)
sum(results$adj.P.Val < 0.05)
## [1] 1812

5.2 Step E.2 Heatmap of top DE probes

# Visualising the top differentially expressed probes across samples
library(pheatmap)
library(RColorBrewer)

topN       <- 50
top_probes <- rownames(results[order(results$adj.P.Val), ])[1:topN]
mat_top    <- expr_log[top_probes, , drop = FALSE]
mat_z      <- t(scale(t(mat_top)))  # z-score per gene

# Annotation
ann_col <- data.frame(Group = group)
rownames(ann_col) <- colnames(mat_z)

# Custom colors
ann_colors <- list(
  Group = c(COPD = "firebrick", Control = "steelblue")  # adjust names to match your group levels
)

pheatmap(
  mat_z,
  annotation_col  = ann_col,
  annotation_colors = ann_colors,
  color           = colorRampPalette(rev(brewer.pal(9, "RdBu")))(100),
  show_colnames   = FALSE,
  show_rownames   = TRUE,
  fontsize_row    = 7,
  fontsize        = 11,
  border_color    = NA,                      # removes grid lines between cells
  clustering_distance_rows = "euclidean",
  clustering_distance_cols = "euclidean",
  clustering_method        = "complete",
  main            = paste("Top", topN, "DE Probes (Z-scored)"),
  treeheight_row  = 30,
  treeheight_col  = 30
)

cat(
  "Interpretation (Heatmap of top DE probes):\n",
  "- Rows are the top differentially expressed probes/genes; columns are samples.\n",
  "- Values are z-scored per gene (row-wise), so colours show relative up/down expression within each gene.\n",
  "- Separation of COPD vs Control in clustering suggests a coherent disease signature.\n",
  "- Mixed clustering suggests heterogeneity or confounding effects.\n",
  "- Isolated samples may be outliers and should be checked in PCA/QC.\n",
  sep = ""
)
## Interpretation (Heatmap of top DE probes):
## - Rows are the top differentially expressed probes/genes; columns are samples.
## - Values are z-scored per gene (row-wise), so colours show relative up/down expression within each gene.
## - Separation of COPD vs Control in clustering suggests a coherent disease signature.
## - Mixed clustering suggests heterogeneity or confounding effects.
## - Isolated samples may be outliers and should be checked in PCA/QC.

5.3 Step E.3 — Volcano and MA plots (visual QC)

res_df <- results %>%
  dplyr::mutate(
    ProbeID = rownames(results),
    negLog10FDR = -log10(adj.P.Val),
    sig = adj.P.Val < 0.05 & abs(logFC) > 0.5
  )

library(ggplot2)

# Volcano Plot 
ggplot(res_df, aes(x = logFC, y = negLog10FDR)) +
  geom_point(aes(color = sig), alpha = 0.7, size = 2) +
  geom_hline(yintercept = -log10(0.05), linetype = "dashed", color = "gray40", linewidth = 0.7) +
  geom_vline(xintercept = c(-1, 1),     linetype = "dashed", color = "gray40", linewidth = 0.7) +
  scale_color_manual(
    values = c("FALSE" = "steelblue", "TRUE" = "firebrick"),
    labels = c("FALSE" = "Not Significant", "TRUE" = "Significant")
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title   = element_text(hjust = 0.5, face = "bold", size = 16),
    axis.title   = element_text(face = "bold"),
    legend.title = element_text(face = "bold"),
    panel.grid.minor = element_blank(),
    plot.background  = element_rect(fill = "white", color = NA)
  ) +
  labs(
    title = "Volcano Plot — COPD vs Control",
    x     = "Log2 Fold Change",
    y     = "-log10(FDR)",
    color = "Significance"
  )

cat(
  "Interpretation (Volcano Plot):\n",
  "- Each point is one probe/gene.\n",
  "- X-axis = log2 fold change (direction + magnitude of change in COPD vs Control).\n",
  "- Y-axis = -log10(FDR), controlling for multiple testing.; higher points are more statistically significant.\n",
  "- Points far left/right and high up are the most compelling candidates: large effect + strong significance.\n",
  "- Dashed lines mark FDR = 0.05 (horizontal) and logFC = ±1 (vertical) cutoffs.\n",
  "- A symmetric volcano suggests balanced up/down changes; strong skew may indicate global shifts or confounding.\n",
  sep = ""
)
## Interpretation (Volcano Plot):
## - Each point is one probe/gene.
## - X-axis = log2 fold change (direction + magnitude of change in COPD vs Control).
## - Y-axis = -log10(FDR), controlling for multiple testing.; higher points are more statistically significant.
## - Points far left/right and high up are the most compelling candidates: large effect + strong significance.
## - Dashed lines mark FDR = 0.05 (horizontal) and logFC = ±1 (vertical) cutoffs.
## - A symmetric volcano suggests balanced up/down changes; strong skew may indicate global shifts or confounding.
# MA Plot
ggplot(res_df, aes(x = AveExpr, y = logFC)) +
  geom_point(aes(color = sig), alpha = 0.7, size = 2) +
  geom_hline(yintercept =  0, linetype = "dashed", color = "gray40", linewidth = 0.7) +
  geom_hline(yintercept =  1, linetype = "dotted", color = "firebrick", linewidth = 0.6) +
  geom_hline(yintercept = -1, linetype = "dotted", color = "steelblue", linewidth = 0.6) +
  scale_color_manual(
    values = c("FALSE" = "steelblue", "TRUE" = "firebrick"),
    labels = c("FALSE" = "Not Significant", "TRUE" = "Significant")
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title   = element_text(hjust = 0.5, face = "bold", size = 16),
    axis.title   = element_text(face = "bold"),
    legend.title = element_text(face = "bold"),
    panel.grid.minor = element_blank(),
    plot.background  = element_rect(fill = "white", color = NA)
  ) +
  labs(
    title = "MA Plot — COPD vs Control",
    x     = "Average Expression",
    y     = "Log2 Fold Change",
    color = "Significance"
  )

cat(
  "Interpretation (MA Plot):\n",
  "- Each point is a probe/gene.\n",
  "- X-axis = average expression (mean intensity across all samples).\n",
  "- Y-axis = log2 fold change (COPD vs Control).\n",
  "- This checks whether fold-changes depend on expression level.\n",
  "- A funnel shape is normal (more noise at low expression).\n",
  "- Dotted red/blue lines mark logFC = ±1 reference cutoffs.\n",
  "- If extreme fold-changes occur only at very low expression, results may be noise-driven.\n",
  sep = ""
)
## Interpretation (MA Plot):
## - Each point is a probe/gene.
## - X-axis = average expression (mean intensity across all samples).
## - Y-axis = log2 fold change (COPD vs Control).
## - This checks whether fold-changes depend on expression level.
## - A funnel shape is normal (more noise at low expression).
## - Dotted red/blue lines mark logFC = ±1 reference cutoffs.
## - If extreme fold-changes occur only at very low expression, results may be noise-driven.

5.4 Step E.4 — Top hits (quick tables)

top_up   <- results[results$logFC > 0 & results$adj.P.Val < 0.05, ][1:10, ]
top_down <- results[results$logFC < 0 & results$adj.P.Val < 0.05, ][1:10, ]

top_up
top_down

6 Step F — Map Illumina probe IDs to gene symbols and Entrez IDs

The limma result rownames are Illumina probe IDs (for example, ILMN_1676938). Pathway tools usually expect gene symbols or Entrez IDs.

6.1 Step F.1 — Download platform annotation (GPL10558)

gpl <- getGEO("GPL10558")
gpl_tab <- Table(gpl)

# Build a mapping table from probe -> SYMBOL and Entrez
annot_map <- gpl_tab %>%
  dplyr::transmute(
    ProbeID  = as.character(ID),
    SYMBOL   = trimws(as.character(Symbol)),
    ENTREZID = trimws(as.character(Entrez_Gene_ID))
  )

# Clean empty strings
annot_map$SYMBOL[annot_map$SYMBOL %in% c("", "NA")] <- NA
annot_map$ENTREZID[annot_map$ENTREZID %in% c("", "NA")] <- NA

# If SYMBOL has multiple mappings like "A /// B" or "A;B", keep the first
annot_map$SYMBOL <- sub("///.*$", "", annot_map$SYMBOL)
annot_map$SYMBOL <- sub(";.*$",   "", annot_map$SYMBOL)
annot_map$SYMBOL <- trimws(annot_map$SYMBOL)

head(annot_map)

6.2 Step F.2 — Create an annotated results table for the strong signature

Definition used here: FDR < 0.05 and |logFC| > 0.5.

sig_strong <- results[results$adj.P.Val < 0.05 & abs(results$logFC) > 0.5, ]
sig_strong$ProbeID <- rownames(sig_strong)

annotated <- sig_strong %>%
  dplyr::left_join(annot_map, by = "ProbeID")

head(annotated[, c("ProbeID", "SYMBOL", "ENTREZID", "logFC", "adj.P.Val")])
mean(!is.na(annotated$SYMBOL))
## [1] 0.9897523
mean(!is.na(annotated$ENTREZID))
## [1] 0.9897523
table(is.na(annotated$SYMBOL))
## 
## FALSE  TRUE 
##  1159    12

7 Step G — Over-representation analysis (ORA)

ORA tests whether a set of “significant” genes is enriched for pathways compared with background.

7.1 Step G.1 — Build up/down Entrez lists

We use Entrez IDs because GO and Reactome tools are more robust with Entrez than symbols.

entrez_up <- annotated %>%
  dplyr::filter(logFC > 0) %>%
  dplyr::pull(ENTREZID) %>%
  na.omit() %>%
  unique()

entrez_down <- annotated %>%
  dplyr::filter(logFC < 0) %>%
  dplyr::pull(ENTREZID) %>%
  na.omit() %>%
  unique()

length(entrez_up)
## [1] 69
length(entrez_down)
## [1] 1045
head(entrez_up)
## [1] "3357" "761"  "348"  "6507" "6363" "2266"
head(entrez_down)
## [1] "649214"    "391359"    "643431"    "100130289" "344866"    "81688"

7.2 Step G.2 — GO Biological Process enrichment

ego_up <- enrichGO(
  gene          = entrez_up,
  OrgDb         = org.Hs.eg.db,
  keyType       = "ENTREZID",
  ont           = "BP",
  pAdjustMethod = "BH",
  qvalueCutoff  = 0.05,
  readable      = TRUE
)

ego_down <- enrichGO(
  gene          = entrez_down,
  OrgDb         = org.Hs.eg.db,
  keyType       = "ENTREZID",
  ont           = "BP",
  pAdjustMethod = "BH",
  qvalueCutoff  = 0.05,
  readable      = TRUE
)

head(as.data.frame(ego_up))
head(as.data.frame(ego_down))
dotplot(ego_up, showCategory = 15) +
  ggtitle("GO BP - Upregulated in COPD") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"))

cat(
  "Interpretation (GO BP enrichment — Upregulated in COPD):\n",
  "- This is over-representation analysis on genes upregulated in COPD.\n",
  "- Dot size = number of genes contributing to the term; colour = adjusted p-value (FDR).\n",
  "- Terms related to extracellular matrix, immune activation, chemotaxis, and remodelling are biologically coherent for COPD.\n",
  "- Highly redundant GO terms are common; focus on the consistent themes rather than individual term names.\n",
  sep = ""
)
## Interpretation (GO BP enrichment — Upregulated in COPD):
## - This is over-representation analysis on genes upregulated in COPD.
## - Dot size = number of genes contributing to the term; colour = adjusted p-value (FDR).
## - Terms related to extracellular matrix, immune activation, chemotaxis, and remodelling are biologically coherent for COPD.
## - Highly redundant GO terms are common; focus on the consistent themes rather than individual term names.
dotplot(ego_down, showCategory = 15) +
  ggtitle("GO BP - Downregulated in COPD") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"))

cat(
  "\n\nInterpretation (GO BP enrichment — Downregulated in COPD):\n",
  "- This is over-representation analysis on genes downregulated in COPD.\n",
  "- Enrichment in RNA processing, splicing, translation, or ribosome-related terms often reflects suppression of baseline biosynthetic programmes.\n",
  "- If few terms appear, it may mean: (i) fewer down genes, (ii) weaker effect sizes, or (iii) mapping drop-outs.\n",
  sep = ""
)
## 
## 
## Interpretation (GO BP enrichment — Downregulated in COPD):
## - This is over-representation analysis on genes downregulated in COPD.
## - Enrichment in RNA processing, splicing, translation, or ribosome-related terms often reflects suppression of baseline biosynthetic programmes.
## - If few terms appear, it may mean: (i) fewer down genes, (ii) weaker effect sizes, or (iii) mapping drop-outs.

7.3 Step G.3 — Reactome enrichment (ORA)

Reactome can be cleaner than GO for disease interpretation.

react_up <- enrichPathway(
  gene          = entrez_up,
  organism      = "human",
  pAdjustMethod = "BH",
  qvalueCutoff  = 0.05,
  readable      = TRUE
)

react_down <- enrichPathway(
  gene          = entrez_down,
  organism      = "human",
  pAdjustMethod = "BH",
  qvalueCutoff  = 0.05,
  readable      = TRUE
)

head(as.data.frame(react_up))
head(as.data.frame(react_down))
dotplot(react_up, showCategory = 15) + ggtitle("Reactome ORA - Up in COPD")

# Guard: do not plot if empty
if (nrow(as.data.frame(react_down)) == 0) {
  cat("Reactome ORA (down) returned no significant pathways at q < 0.05.\n",
      "This can happen if the down gene list is broad and non-specific, or if mapping reduces the list.\n")
} else {
  dotplot(react_down, showCategory = 15) + ggtitle("Reactome ORA - Down in COPD")
}
## Reactome ORA (down) returned no significant pathways at q < 0.05.
##  This can happen if the down gene list is broad and non-specific, or if mapping reduces the list.
cat(
  "Interpretation (Reactome enrichment — Up in COPD):\n",
  "- Reactome pathways are curated and often more interpretable than GO.\n",
  "- Enrichment for ECM organisation, collagen degradation, integrin signalling, and immune pathways fits COPD tissue remodelling.\n",
  "- Pathways with larger gene counts and low FDR are the strongest signals.\n",
  sep = ""
)
## Interpretation (Reactome enrichment — Up in COPD):
## - Reactome pathways are curated and often more interpretable than GO.
## - Enrichment for ECM organisation, collagen degradation, integrin signalling, and immune pathways fits COPD tissue remodelling.
## - Pathways with larger gene counts and low FDR are the strongest signals.
cat(
  "\n\nInterpretation (Reactome enrichment — Down in COPD):\n",
  "- If this output is empty (0 rows), it usually means no Reactome pathways pass the chosen FDR cutoff.\n",
  "- Common reasons: fewer mapped Entrez IDs for downregulated genes, weaker coordinated pathway structure, or conservative thresholds.\n",
  "- You can explore with a relaxed cutoff (e.g., qvalueCutoff = 0.10) or switch to ranked GSEA (which does not require a hard DE cutoff).\n",
  sep = ""
)
## 
## 
## Interpretation (Reactome enrichment — Down in COPD):
## - If this output is empty (0 rows), it usually means no Reactome pathways pass the chosen FDR cutoff.
## - Common reasons: fewer mapped Entrez IDs for downregulated genes, weaker coordinated pathway structure, or conservative thresholds.
## - You can explore with a relaxed cutoff (e.g., qvalueCutoff = 0.10) or switch to ranked GSEA (which does not require a hard DE cutoff).

8 Step H — Reactome GSEA (rank-based enrichment)

GSEA uses the full ranked signal rather than a hard significance cut-off. This is often more stable.

8.1 Step H.1 — Build the ranked gene list (limma t-statistic)

Important: use the full mapping table from GPL, not only the significant subset.

gene_list <- results$t

entrez_map_full <- annot_map %>%
  dplyr::select(ProbeID, ENTREZID) %>%
  dplyr::filter(!is.na(ENTREZID))

names(gene_list) <- entrez_map_full$ENTREZID[match(rownames(results), entrez_map_full$ProbeID)]

8.2 Step H.2 — Pre-GSEA cleaning (must be sorted and unique)

This block prevents the common GSEA failures: NA names, duplicated Entrez IDs, and unsorted vectors.

# Must be numeric and named
stopifnot(is.numeric(gene_list))
stopifnot(!is.null(names(gene_list)))

# Remove NA and non-finite values
gene_list <- gene_list[is.finite(gene_list)]
gene_list <- gene_list[!is.na(names(gene_list))]

# Make names character (Entrez IDs)
names(gene_list) <- as.character(names(gene_list))

# Sort decreasing and drop duplicates (keep the best-ranked instance)
gene_list <- sort(gene_list, decreasing = TRUE)
gene_list <- gene_list[!duplicated(names(gene_list))]

# Final check: sorted decreasing
all(diff(gene_list) <= 0, na.rm = TRUE)
## [1] TRUE
length(gene_list)
## [1] 15718
head(gene_list)
##     3357      761      348     6507     6363     2266 
## 5.038668 4.687792 4.685257 4.561199 4.363037 4.332848

8.3 Step H.3 — Run Reactome GSEA

gsea_reactome <- gsePathway(
  geneList      = gene_list,
  organism      = "human",
  pAdjustMethod = "BH",
  pvalueCutoff  = 0.05,
  verbose       = FALSE
)

head(as.data.frame(gsea_reactome))

8.4 Step H.4 — Visualise GSEA results

gsea_df <- as.data.frame(gsea_reactome)

if (is.null(gsea_reactome) || nrow(gsea_df) == 0) {
  cat("No significant Reactome GSEA pathways at FDR < 0.05.\n",
      "Try pvalueCutoff = 0.1, or double-check Entrez mapping.\n")
} else {
  dotplot(gsea_reactome, showCategory = 15) +
    ggtitle("Reactome GSEA - Ranked by limma t-statistic") +
    theme(plot.title = element_text(hjust = 0.5, face = "bold"))
}

cat(
  "Interpretation (Reactome GSEA — ranked by limma t-stat):\n",
  "- This uses the full ranked gene list (by limma t-stat), not just a 'significant genes' cutoff.\n",
  "- It detects pathways where many genes shift in a coordinated direction, even if individual genes are modest.\n",
  "- NES (Normalised Enrichment Score) indicates direction:\n",
  "    * Positive NES = enriched at the top of the ranking (more up in COPD if ranking is COPD-positive).\n",
  "    * Negative NES = enriched at the bottom (more down in COPD).\n",
  "- Very low adjusted p-values indicate strong, consistent pathway-level shifts.\n",
  sep = ""
)
## Interpretation (Reactome GSEA — ranked by limma t-stat):
## - This uses the full ranked gene list (by limma t-stat), not just a 'significant genes' cutoff.
## - It detects pathways where many genes shift in a coordinated direction, even if individual genes are modest.
## - NES (Normalised Enrichment Score) indicates direction:
##     * Positive NES = enriched at the top of the ranking (more up in COPD if ranking is COPD-positive).
##     * Negative NES = enriched at the bottom (more down in COPD).
## - Very low adjusted p-values indicate strong, consistent pathway-level shifts.

8.5 Step H.5 — Plot one enrichment curve (strongest evidence figure)

library(enrichplot)
library(ggplot2)

if (!is.null(gsea_reactome) && nrow(as.data.frame(gsea_reactome)) > 0) {

  gsea_df  <- as.data.frame(gsea_reactome)
  top_id   <- gsea_df$ID[1]
  top_desc <- gsea_df$Description[1]

  gseaplot2(
    gsea_reactome,
    geneSetID = top_id,
    title     = paste("GSEA Enrichment —", top_desc),
    color     = "firebrick",
    base_size = 13,
    rel_heights = c(1.5, 0.4, 0.8)
  )

} else {
  message("No significant Reactome GSEA results found — skipping enrichment plot.")
}

cat(
  "Interpretation (GSEA Enrichment Curve):\n",
  "- The enrichment curve shows the running enrichment score as you move down the ranked gene list.\n",
  "- Black tick marks show where genes from this pathway appear in the ranked list.\n",
  "- A peak near the left = pathway genes concentrated among the most upregulated genes in COPD.\n",
  "- A peak near the right = enrichment among downregulated genes.\n",
  "- The bottom bar shows the ranked list metric (e.g. t-statistic or log2FC).\n",
  "- This is often the most convincing single figure — it shows the enrichment signal directly.\n",
  sep = ""
)
## Interpretation (GSEA Enrichment Curve):
## - The enrichment curve shows the running enrichment score as you move down the ranked gene list.
## - Black tick marks show where genes from this pathway appear in the ranked list.
## - A peak near the left = pathway genes concentrated among the most upregulated genes in COPD.
## - A peak near the right = enrichment among downregulated genes.
## - The bottom bar shows the ranked list metric (e.g. t-statistic or log2FC).
## - This is often the most convincing single figure — it shows the enrichment signal directly.

9 Step I — Integrated Biological Interpretation

cat(
"Final Integrated Interpretation:\n\n",
"This transcriptomic analysis of COPD lung tissue reveals a coordinated molecular reprogramming characterised by extracellular matrix remodeling, altered SLIT–ROBO signaling, and suppression of core translational machinery.\n\n",
"PCA demonstrated that although COPD status does not dominate total variance (PC1), a significant disease-associated axis is present (PC2), indicating structured but heterogeneous transcriptional shifts.\n\n",
"Differential expression identified widespread dysregulation, and pathway-level enrichment confirmed that these changes are not random but reflect coordinated biological programs.\n\n",
"Rank-based Reactome GSEA further strengthened this conclusion by detecting consistent pathway-level shifts without arbitrary gene cutoffs, highlighting structural remodeling and translational suppression as dominant COPD-associated programs.\n\n",
"Together, these findings support a model in which COPD lung tissue undergoes chronic structural reorganization and stress-associated signaling reprogramming, rather than isolated gene-level perturbations.",
sep = ""
)
## Final Integrated Interpretation:
## 
## This transcriptomic analysis of COPD lung tissue reveals a coordinated molecular reprogramming characterised by extracellular matrix remodeling, altered SLIT–ROBO signaling, and suppression of core translational machinery.
## 
## PCA demonstrated that although COPD status does not dominate total variance (PC1), a significant disease-associated axis is present (PC2), indicating structured but heterogeneous transcriptional shifts.
## 
## Differential expression identified widespread dysregulation, and pathway-level enrichment confirmed that these changes are not random but reflect coordinated biological programs.
## 
## Rank-based Reactome GSEA further strengthened this conclusion by detecting consistent pathway-level shifts without arbitrary gene cutoffs, highlighting structural remodeling and translational suppression as dominant COPD-associated programs.
## 
## Together, these findings support a model in which COPD lung tissue undergoes chronic structural reorganization and stress-associated signaling reprogramming, rather than isolated gene-level perturbations.

10 Step J — Save outputs and report session info

Keeping saved tables makes your GitHub repo reproducible.

dir.create("results", showWarnings = FALSE)

write.csv(annotated, file.path("results", "limma_sig_strong_annotated.csv"), row.names = FALSE)
write.csv(as.data.frame(ego_up), file.path("results", "GO_up_COPD.csv"), row.names = FALSE)
write.csv(as.data.frame(ego_down), file.path("results", "GO_down_COPD.csv"), row.names = FALSE)
write.csv(as.data.frame(react_up), file.path("results", "Reactome_ORA_up.csv"), row.names = FALSE)
write.csv(as.data.frame(react_down), file.path("results", "Reactome_ORA_down.csv"), row.names = FALSE)
write.csv(as.data.frame(gsea_reactome), file.path("results", "Reactome_GSEA.csv"), row.names = FALSE)

11 Step K — Limitations and Considerations

cat(
"Limitations:\n",
"- Microarray platforms have limited dynamic range compared to RNA-seq.\n",
"- Bulk lung tissue mixes multiple cell types, potentially confounding cell composition effects.\n",
"- Smoking status, inflammation level, and batch effects may contribute to dominant variance axes.\n",
"- Causality cannot be inferred from cross-sectional differential expression.\n",
"- Functional validation would be required to confirm pathway-level conclusions.\n",
sep = ""
)
## Limitations:
## - Microarray platforms have limited dynamic range compared to RNA-seq.
## - Bulk lung tissue mixes multiple cell types, potentially confounding cell composition effects.
## - Smoking status, inflammation level, and batch effects may contribute to dominant variance axes.
## - Causality cannot be inferred from cross-sectional differential expression.
## - Functional validation would be required to confirm pathway-level conclusions.
cat(
"\n\nClinical Context:\n",
"The observed enrichment of extracellular matrix and signaling pathways aligns with known COPD pathology involving airway remodeling, fibrosis, and altered epithelial–stromal communication.\n",
sep = ""
)
## 
## 
## Clinical Context:
## The observed enrichment of extracellular matrix and signaling pathways aligns with known COPD pathology involving airway remodeling, fibrosis, and altered epithelial–stromal communication.
cat(
"\n\nMethodological Summary:\n",
"Data were quality filtered using detection p-values, log2-transformed, analyzed using limma with empirical Bayes moderation, and interpreted using both over-representation analysis and rank-based GSEA to ensure robustness of pathway-level findings.\n",
sep = ""
)
## 
## 
## Methodological Summary:
## Data were quality filtered using detection p-values, log2-transformed, analyzed using limma with empirical Bayes moderation, and interpreted using both over-representation analysis and rank-based GSEA to ensure robustness of pathway-level findings.
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.1.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] RColorBrewer_1.1-3     ReactomePA_1.50.0      enrichplot_1.26.6     
##  [4] org.Hs.eg.db_3.20.0    AnnotationDbi_1.68.0   IRanges_2.40.1        
##  [7] S4Vectors_0.44.0       clusterProfiler_4.14.6 limma_3.62.2          
## [10] GEOquery_2.74.0        Biobase_2.66.0         BiocGenerics_0.52.0   
## [13] pheatmap_1.0.13        data.table_1.17.0      lubridate_1.9.4       
## [16] forcats_1.0.0          stringr_1.5.1          dplyr_1.1.4           
## [19] purrr_1.0.4            readr_2.1.5            tidyr_1.3.1           
## [22] tibble_3.2.1           ggplot2_4.0.0          tidyverse_2.0.0       
## 
## loaded via a namespace (and not attached):
##   [1] rstudioapi_0.17.1           jsonlite_2.0.0             
##   [3] magrittr_2.0.4              ggtangle_0.1.1             
##   [5] farver_2.1.2                rmarkdown_2.29             
##   [7] fs_1.6.6                    zlibbioc_1.52.0            
##   [9] vctrs_0.6.5                 memoise_2.0.1              
##  [11] ggtree_3.14.0               htmltools_0.5.9            
##  [13] S4Arrays_1.6.0              curl_6.2.2                 
##  [15] SparseArray_1.6.2           gridGraphics_0.5-1         
##  [17] sass_0.4.10                 bslib_0.10.0               
##  [19] httr2_1.1.2                 plyr_1.8.9                 
##  [21] cachem_1.1.0                igraph_2.2.2               
##  [23] lifecycle_1.0.5             pkgconfig_2.0.3            
##  [25] Matrix_1.7-3                R6_2.6.1                   
##  [27] fastmap_1.2.0               gson_0.1.0                 
##  [29] GenomeInfoDbData_1.2.13     MatrixGenerics_1.18.1      
##  [31] digest_0.6.39               aplot_0.2.9                
##  [33] colorspace_2.1-1            patchwork_1.3.2            
##  [35] GenomicRanges_1.58.0        RSQLite_2.3.9              
##  [37] labeling_0.4.3              timechange_0.3.0           
##  [39] polyclip_1.10-7             httr_1.4.7                 
##  [41] abind_1.4-8                 compiler_4.4.2             
##  [43] bit64_4.6.0-1               withr_3.0.2                
##  [45] graphite_1.52.0             S7_0.2.0                   
##  [47] BiocParallel_1.40.0         viridis_0.6.5              
##  [49] DBI_1.2.3                   ggforce_0.5.0              
##  [51] R.utils_2.13.0              MASS_7.3-65                
##  [53] rappdirs_0.3.4              DelayedArray_0.32.0        
##  [55] tools_4.4.2                 ape_5.8-1                  
##  [57] rentrez_1.2.4               R.oo_1.27.1                
##  [59] glue_1.8.0                  nlme_3.1-168               
##  [61] GOSemSim_2.32.0             grid_4.4.2                 
##  [63] reshape2_1.4.4              fgsea_1.32.4               
##  [65] generics_0.1.3              gtable_0.3.6               
##  [67] tzdb_0.5.0                  R.methodsS3_1.8.2          
##  [69] hms_1.1.3                   tidygraph_1.3.1            
##  [71] xml2_1.3.8                  XVector_0.46.0             
##  [73] ggrepel_0.9.6               pillar_1.10.1              
##  [75] yulab.utils_0.2.4           splines_4.4.2              
##  [77] tweenr_2.0.3                treeio_1.30.0              
##  [79] lattice_0.22-6              bit_4.6.0                  
##  [81] tidyselect_1.2.1            GO.db_3.20.0               
##  [83] Biostrings_2.74.1           reactome.db_1.89.0         
##  [85] knitr_1.50                  gridExtra_2.3              
##  [87] SummarizedExperiment_1.36.0 xfun_0.51                  
##  [89] graphlayouts_1.2.3          statmod_1.5.1              
##  [91] matrixStats_1.5.0           stringi_1.8.7              
##  [93] UCSC.utils_1.2.0            lazyeval_0.2.2             
##  [95] ggfun_0.2.0                 yaml_2.3.10                
##  [97] evaluate_1.0.3              codetools_0.2-20           
##  [99] ggraph_2.2.2                qvalue_2.38.0              
## [101] graph_1.84.1                BiocManager_1.30.25        
## [103] ggplotify_0.1.3             cli_3.6.5                  
## [105] jquerylib_0.1.4             Rcpp_1.1.1                 
## [107] GenomeInfoDb_1.42.3         png_0.1-8                  
## [109] XML_3.99-0.18               parallel_4.4.2             
## [111] blob_1.2.4                  DOSE_4.0.1                 
## [113] viridisLite_0.4.2           tidytree_0.4.7             
## [115] scales_1.4.0                crayon_1.5.3               
## [117] rlang_1.1.7                 cowplot_1.2.0              
## [119] fastmatch_1.1-8             KEGGREST_1.46.0